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<£) Method for determining the sennantic relatedness of lexical Items In a text. 

® A method for determining the degree to v^hich two or more lexical items belonging to a predefined corpus of 

text in any given language are semantically related to each other, comprising the following steps. ^ 
text any g. en^ J J^ ^^^^ ^^.^ ^^^^^ ^ ^^^^^^^^ ^^^^ ^^e g.ven two 

" "° bH™g"^^^^^^^ «d Of a suitable parsing system, of each of the sentences retrieved. In order 
to determine the syntactic dependency structure of each of the said sentences structure the 

c) for each sentence retrieved, deten^ining from the obtained syntactic ^^P^"^^";^ ^^"^'""^^ 
contextual relations which the given lexical items have in that sentence. ..e. identrfymg '^^"^^ '"J^^ 

Snte^which have a syntactic relation to those of the given lexical items which appear in the sentence 
concerned, together with the syntactic relations involved. , , t^,„H in 

drdete?mining. for each of the given lexical items, the total number of contextual relations found in 

e) determining the number of contextual relations which the given lexical items, have in common 
0 detem^ining on the basis of the results obtained in steps d) and e). the degree, of overlap between 
- the contextual patterns of the given two or more lexical Items. 
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Method for determining the semantic relatedness of lexical Items in a text, r 

- The invention concerns a method :f or detemryning, the degree; tq iWhich two or more lexical Items 
(morphemes, words, collocations or phrases) belonging to a predefined text corpus in any given language 
are serhahtically related.; ** . , j-* : . r „ t , • .i 

• Knowledge -of the semantic relations between two*, or more- lexical itenris in ,a. text, has. applications in 

5 various^fielQS, ihcluding^computer programs for, word processing apd programjS- for automatic translation of 
- texts in one natural language into texts tin another naturaijanguage. . . 

- Until now it has been customary to t>ase; the determination of semantic relatedness on information 

previously entered in -a dictionary rfiler Such dictionary files^ contain identification codes ^which Indicate, for 
each word in the dictionary,, what: semantic features that word ^has;, Alternatively, ^a system ot classification 

70 can be used to classify each word according to .its semantic: type,, or the^ meaning^of each word can be 
analysed into semantic components or primitives.. Although such methods ^are widely applied by linguistics 
researchers 'they are\highly labour-intensive .and difficult to apply consistently^on a large scale owing to 
subjective biases, which have a considerable influence on the determination of semantic^relations by these 
methods. - i> 

15 The present invention has the* aimi of showing how ^the semantic related two or more lexical 

items can be determined automatically, without involving the personal judgement qf:tfie. user. 

This aim is achieved, according, to the invention, .through., at method :f or determining the degree to. which 
two -or more lexical items belonging to a rpredefined text -corpus in,^,any given language are semanticaily 
related, comprising the following steps: - . 'r ^ r - i ■ . * - 

20 - a) the retrieval from the -said text corpu.s of a set pf:seriter|ces in which one or more pf the given two 
or more lexical-items appear, : - ^ = > . : -r - . .; - - , 

b) the parsing; with the aid of a suitable parsing system,, pf each of the sentences retrieved, in order 
tO'determine the syntactic dependency.structure of each of the said sentences. ...v. - v 

c) for each sentence retrieved, determining from the obtained syntactic dependenc^r., structure the 
25 contextual relations which the given lexical items have in that sentence, i.e. identifying those items in the 

context which have a syntactic relation to those of the given lexical items which appear in the sentence 
concerned, together with the syntactic relations involved, » * t .:i.\:: 7, : * ^^o;:^.::- ' or 

d) determining, for each of the given lexical items, Jthe total number of contextual relations found in 
stepc). 

30 e) determining the number of contextual relatibns^ich; the given lexical items have in common, 

f) determining, on the -basis of the, results pbtairied in steps d) and e). the degree of overlap between 
the contextual^ patterns of the given two or more lexical items! " ' 

As a result of this method an indieation-^is obtained of the strength of the semantic relation between the 
given two or more lexical itenns. This allows a word processing program, an automatic translation program 
35 or any otiier such program to make an independent and automatic decision, and to carry out other 
processing steps on the basis of that decision. 

Although there are a number of methods of statistical analysis which can be applied in order to 
compute the measure of semantic relatedness, the preferred method is to split step f) into two parts: 

f1) determir)ing the nunjber of common conte>dUja!.relati^^ which can be expected by chance alone, 
40 f2) comparing the number obtained by step f1) with the number obtained t>y step e). 

' . The cornparison in step f2) should preferably be^ performed k>y evaluating Uie foliowing formula: 
semantic relatedness — (C-E)/(C+.K), " . . . , ■ / 

where > ■ ^ .,- , . \..-^?.. . " ' 

C- = the number of coniinon.contextuai relations^ t 
,45 E =• the nurpb6r:Of common contextual relations wliich cari be expected by. chance alone, as obtained by 

Stepfl) ' ^ i - ;..f.._ ■ 

\ K = a constant ^ . / ; . . . ' ' - 

. ^ Although , the method according, to the,Jny^ntion ,can in rnany cases yidid good* results even with a 
. limited jmjmtjer of seritences extracted from, the text corpus.. it will usually be preferable to retrieve from the 
60 text corpus, jn step a), all sentences in which .one .or rpore .of the giveiri lexical items appears. The degree of 
semantic relatedness between the given two or more lexical iterhs can be determined with' the highest 
, degree of confidence when all the. contextual relatjqns of. the said lexical items are taken jrlto account, in 
..other words when .all sentences in which one or rnore, of the giveri lexicaf items appears are retrieved from 
the text corpus. V * . » 

. The invention will now be described in greater detail with the aid of some examples of its application. 
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Example 1' : ' ' ' • ^ - w : . • 

liieasiiring the semantic proximity, or semantic distance between tlie words DISCABD and. REMOVE. 

^ As an example of the method according to the invention, in what follows the semantic proximity 
' bdtWbeh two words -fs deterrhined on the basis of: a number ol sentences extracted from an aircraft 
maintenance manual. In this example -only 'a few sentences; are used for each of the^two key words, but it 
will be obvious that as many sentences as possible Should be used in order to obtain reliable results, and 
io that preterably-the method should be baJ;ed tfn all thdse ssentencesMn the whole text corpus (in this case the 
Whole maintenance manual), which cbntain fc)ne-or-both of:the:key words.-ln the present ex^pte the aim is 
to determine the -sgmahtlb proximity between -the words DISCARD, and REMOVE. The following five 
' "^entence'b >htere retrieved from the - 
tirRembve'and "DISCARD the O-rings^Q and'12>. ; ;r : . r 
75 ' - [5] Remove emd DISCARD the SF>lit p\t\^ (18) and remove the riuts4t7).and washers (16) from the 
ciamp rdds (11). ^r . v : ^£ : r ^ : ' i 

[3] DISCARD the gasket (9). • ^ 

[4] Remove and-DISCARD the two splH pins^^hich safetrthe autopilot cable end fittings (21). 
[5} DISCARD the li^kwii-e from the glandnuts (2). ^ i . - 

20 With^the identifieition and retrieval^)f thfese sentences, step a) of the method according to the invention 
has beeiST>artra1ly cdtnp^^^^ 

containing the word REMOVE, and this part will be discussed t)elow.) Next. as. defined step b) of the 
method/'eabh of the sentences retrieved must be parsed with the^id of a suitable parsing systenn m order 
to determine the syntactic dependency structure of each sentence. Such syntactic analysers or parsers 
25 ^equire^ no further 'explanation for a specialist in this 'field.. For:example, the last sentence of the above set 
might be converted bVoriebl the known types of parser to a syntactic dependency tree with the foltowing 
■'"^resuIts: '""^ - ' . ^ . - - -v;^-.. • . .; 

30 [GOVERNOR, ' d±SCa^rd\ r.s<ic r\ :'-o::s\3^^ -}.r'r'.n ^il^ . ^-^n-^cj 

[DIRECT-OBJECT, ' lockwi^e , 
■ . , V -y . V- . ir iDETERMINER/* cth#*;l.f; . - 

' [pjiEPpSiTibNi^L-^ADJUf^ 

: &" - . :.;tPBEPOS ITIPNALt ARGUMENT / .*gi^^ 

—^- - '^\~^' [ 

V^iEFITHET,. ].] 1 ] 

' '(The lingui^pt^ms used-ih the ^o^^ repf^W?^ to be familrar to a specialist in this field 

and to need no 'further elucidaiibi}.)/ ' * \ ' ^ ' . \ . , . 

The key word (or words, if iDOth^ k^'Wdrdf^ happen to occur -in the same sentence) can now be 
^5 extracted from this dependency structure, together with those elements of the context which have a direct 
relation to the key word (or words). For example, from the above dependency structure for sentence No. 5 it 
is possible to determine that the key" wdrd DISCARD has^ a direct relation to the vsrord "lockwire". which is 
labelled "DIRECT^OBJECT-. ^uch cdntextual relations can be extracted from the obtaihed dependency 

structure for each sentence in turn. *u . 

50 jn addition the dependency structures obtained are also searched for any indirect relation either of the 
' key words may. have to another w'ofd in its Conteyt via a function word such as a prepositibn or conjunction. 

In the deperidency stmckire which would be-obtained for sentence No. 1, for example.- the key word 
' DISCARD Wuid be found to have an indirect relation to' the other key word REMOVE via the conjunction 



35 



40 



.AND. 



55- 



Vhe result obtained by tabulating ' all the relations "which cdh be found for the above-mentioned key 
words in th^ syntactic dependency stmctures corresponding to the above sentences is as^tollows: 
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Sentence^ 


Relation 


First word 


Relation 


Second 
word 


1 


^ 1 


, rejnove 


AND 


discard 






discard 


OBJECT 


ring 


2 




remove . 


AND 


discard 


2 


2 


discard 


OBJECT 


. pin 


3 


, -1 : 


discard 


OBJECT 


. gasket 




- - 1 ' - 


. rernpve 


AND ^ 


discard 




"- 2-^ 


dispard 


.OBJ^Qt. 


pin 


. 5 . ... 




discard 


, obJect^'^ 


Jbckwire 











The number in' the first cblijmri of each 'row in the above table shows the. number, of/ the sentence, 
corresponding to the numbers used = in the above list of sentences; and the number in the second column 
shows the serial number of the relation found'in the given sentence, in which onei or both of the key words 
Appear. It can.be seen that in a' few cases a" relation exists between the two key words themselves. 

A wholly identical procedure can now be followed for the 'second key word; REMOVE. The .following set 
of five sentences can be extracted from the manual for this'^purpose: 

[1] Lift the loosened bus-bars (7) from the -terminal studs (6) and REMOVE the comactpr:,(14) from 
the interface (1 2). • , 

[2] When power to main ac bus 1 (2) is REMOVEd. the following events occur. 

[3] Do not REMOVE the nuts (5). ~ - 

(4] REMOVE the lockwire and REMOVE the -sensor connector (9) from the receptacle (10). 

[5] REMOVE and discard the split pins (18) and REMOVE the nuts (17) and: washers (16) from the 
clamp rods (1 1 j. ~ : v . r . 

After each of these isentences has been subjected to structural analysis and the respective syntactic 
dependency structures have been obtained, the following relations can be extracted: / 
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til: , 




Sentenee<j 


^jRelation 


Rrst^word 


'l.ReCatibn!,^ 


Second wonj 




' ' ' ' 




lift : 


^ AND : 


: remove: w 








' rerribve 


OBJECT 


contactor^ 


*: *'.:•! •'. 


...^ . . . 




rerhove 


FROM ; 


interface - 




2 




-rerhove 


^:0BJECT 


. powers. ; s 




■ 3 ' 


' 1' 


remove 


^OBJECT 


nut 




4 


1 


remove 


OBJECT 


lockwire' 




4 


2 


remove 


^and: : ; 


remove' 


; . ' ■ i 


4 


3 ' 


remove* 


OBJECT 


connector . 








remove 


FROM I 


receptacle: ? 




5 


1 


remove 


AND 


discard 




5 


2 - 


remove ' 


OBJECT 


pin* ' ; 




5 


- -S ' ■ 


■ remove • 


AND^ : 


remove'! 


...i " }'-' ' 


5 


4 


remove 


^^OBJECT' 


nut . . .. 




5 




' remove 


OBJECT 


washer 




.5 ■• 




' remove 


FROM ' 


rod 





50 Here too., relations are. found .beb^^en; the key wprd^ itself (REMOVE) and varidus other words, but also 
between REMOVE and the other key word piSCAF^^^ \/ 

It also appears from the two tables above that both key words have common reiattons to identical words 
- in their'context; as shown. in the ;second table by an asterisk, thus, for instance, th^ word "pin" appears in 
' the OBJECT relation both to .DISCARD .andftp.REM^ \ ; ' * ' ' 

55 A comparison of the above two tables dearly. shows that identifying the syntactic relations in the context 
makes it i possible, to find; meaningful -similarities. in the contextual patterns of semantiCally related words 
such as. in the present example, the words;, DISCARD and REMOVE. ^ ' 
^ Even with the- limited .number ;Of sentences used iri this exarhple, a . hurnber of common contextual 

u 4 ■ ' , 
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elements already appear. If the whole text is processed. >nd all the sentences are extracted in whrch at 
least one of the key words occurs, then the total number of common contextual elements wHI certainly 
Increase. The more contextual relations the-two key words-have In common, the smaller will be the 
semantic distance between them. or. in other words, the stronger is the similarity or identity between he 

5 meanings or fields of reference of the two words. In accordance with the method as defined by the 
invention, statistical methods can now be applied to the ibove-mentioned lists of relations in order to arrive^ 
at a numerical measure of this semantic proximity. 

This measure of semantic proximity should be a function of. ' 

(a) the number of dontextual relations the words being compared have in common, and 

10 (b) the number of contexttjal relations Whi6h can be found, for each of the key words, in the selected 

set of sentences. (Idefllly.:the.sele^d set of sentences should be *P ^tal te>* c^^^^^ 

Thus in the above example the semantic proximity of the words DISCARD and REMOVE depends not 
only on the number of common relations, such as the OBJECT relation in which "P*"" f !° 

: boi words, but also on the total-number of contextual, relations, the words DISCARD and REMOVE have .n 
75 the text corpus Which serves as the source Of lexical knowleclge. . , ., . r, . ^. :\ ' .. 

There are a large number of possible statistical methods of . expressing «ie degree of semantic 
pro>arfiity between two words. The preferred method, however, is to compute the semantic relatedness 
- mentioned in step f) by subtracting^from the number of relations obtained in step e) the number which^can 
be expected by chance alone, and then dividing the result by the. number obtained in stepe). increased by 
20' a' constant. In other v(ords; the. formula applied,,iS: . \ 

Semantic proximity = (C-E)/(C + K). ' , 

where • " ■ - ■•; ■ •> ■ ■ ■• ■•' ■ •■ 

0 = the number of common contextual relations - • 

E.= the number of such reliations Which csu;) be expected by ch^cp alone. .. 

2S K ■= a constant; :- -..i : ~ • u,. 

The number of relations to be expected on the basis of chance alone is in theory give.n by . 

' . . E =• -A ■ B/f(N). . ■. : . ■. ; ■.. 
where ■. ■ s v; g, ■.■..■•,/!.: 'v • -.^ ■; :. ■ ■„■ .-, i -i 
A = the number Of relations found for the first word. - 
30 B = the numberdf Jflattqns-foundrfcrthe-second w - - 

Suppose that for the^ wbrd DlSCARb in the present exarnpl^ a total of 300 contextua relations^are found 
in the teS tliat for .the word REMQVE a totahpf 500 relations are found, and that 50 of these relations are 
common to both words. Suppose further th^t for the function f(N) of the number of 
35 the corpus of, text a value of I5000;1,as;been established experimentally. a"fl^«'«tj°^t^%~"^;"l'if '^'"^ 
of 1 is chosen. The number . of. common .relations to be expected on the basis of chance alone is 
determined by the above lorrnuja as: , 

E = A • B/f(N) = 300 ' 500/15000 = 10. / ■ , . ^ * 

In accordance , with the first, of the above formula^, a numerical value can now obtained for the 
measure of semantic relatedness, ;or semantiq. proximity in this case, of the two words DISCARD and 
REMOVE: , - " "'i v= rr- • 

otS,;- no, .Ma, ^au« ,ho «s«^ o, ^ « 
cowaxial relaiona ls.no, even, and.beeause i, Is subiao, B vartoos Wn* of cona»a,n . «" 
part ol speed,: lor example. Ho«revar. *e value ot l(N) can also Be set expenmenully by choosing the 

value which yields the most acceptable results. 

The value of K also depends on the application of the method. This constant has a nomiaUzing e«ect 
' first and foremost Adding the consfeht to the deriominator of the above expression causes the semantic 
JeL^ness t^be eXssed by a number between zero and unity. On^the other hand, this constant a so 
hif ??e eSff ofreSg the .f^^asure of Semantic relatedness when this is based on a very low value of C 
- 0 e fvalTwS Jh inSes that the number of cb'mmdh relations is small). This effect can be useful for 
Siting the infZn^. of chance coincidenc^^.^ if -the numbers are relatively small... then in general the 
' conclusions which can be drawn from them wil! be less reliable, . . \ 

^■ SyS happen that iio common conte^hual relations are found for the given lexicaLitems. although 
a certain number of common relations Would be -expected dn'the grounds of chance alone. In that case he 
meagre of semantic relatedness 'acquires a negative value. H is preferable in such cases to replace the 
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term Ciin the denominator of the above expression with the temr^ E, so that the values obtained will be 
normalized between zero and minus one. The formula then becomes: 
relatedness = (C-E)/(E + K). 

. Another possible way of expressing the degree qi semantic relatedness betwjeen two words is to divide 
5 the number of coLmtpon relations C by the sum oif the total number of relations. A, found for the first word 
and the total number of relation, B,, found -for the secorid word. The result is, a^ numerical value which 
expresses the semantic relatedness of the two words. In other words: 
: relatedness = C/{A + B), , ^ ■ , . ^ . 

. where : :. r. . * . - : * . . , . . '[ 

10 A = the total number of relations for the first word. , / . ^ ^* 

. ; ; B^= the total number of relations for the. second word, , \ ' / ' " 

C = the number of common relations., ..^ , - / 

. This fomula yields^ a. value which,, depending on the numbers; involved, will lie between 0 and 1/2 for 
two key words, or between 6 and 1/3 for three key words. Since tfiere is a theoretical upper limit for 
15 semantic relatedness, (namely jcomplete syripnyrnity), it. Jsf^, convenient to again norrhalize the measure of 
relatedness between zero and unity, as In the preferred methoci discussed above. This can be done by 
multiplying the numerator in the above expression by the riumber of key. words inyolved in the corhparison. 
Thus, in general: ; , - -y.,. . . i , • . " , , 

relatedness = (nurnber of .key- words) C/^^^ ^ ' ' \, - ^ : ^ 

20 . Suppose once more that for the word DISCARD in the present example a total of 300 contextual 
relations are^.found.in the. text, that for .the. word REf^pVE .a total pf 500 relations are:fourid: and that 50 of 
these relations arercomnipn to both wprcjs. The nunnericaj measjjre of senriahtic' relatedhess, or semantic 
proximity in this case, for the;two words DISC ARD, and. REMW given 'by.2 " 5b?(30b + 500) = 0.125. 
The larger the number of common relations, the closer the measure of relatedness obtained approaches 
25 unity. . - . . , , , * ' " 

Such ad measure of semantic jjistance. or proxirpity^^can^ b^ the production of 

nnachine translations, for example. By way pfjllustratipn.,ttie English wg^^ "srnboth" arid Jts various French 
translations will be considered. The word "smooth" has a :nurnberl of possible equivalents in French, with 
clearty= different meanings^^ / ^ " 

M^orlz In such cases; as >this,3y»f hereof sjngle.word .can. b(^^ language in jseveral different 

ways, with different meanings, it is common practic^, in.pcwpyen^^ ehtry in 

question with, a nunriber otcodified^.contextual.refferen^ together 
with the relevant meanings or trarislatipns, e.g.: , . \, V . • " ' ' " 1 

smooth (leatfier) = lisse t - , ( -;. ;.7;jy., : V ; I ' . 

35 smooth (road) = uni^ ; J. !..L..V . S , , ' 

smooth (glass) = poll . , . - , - , . \ . - ^ =^ . ^ : . : 

smooth (skin) = doux . , - V - . / ' 

' smooth (talk) = insinuant , , . 'i. 

The problem then is to deduce from the text, being translated which of the meanings is appropriate in 
40 the current context and thus how the .word- ia questi^^^^ to be translated.. For instance, if the word 
"smooth" appears in the combination . "smooth ;P^h".^the system heeds to. be able to cledide whfch of the 
translations given in the dictionary is nnqst -appropriate.. i.e!^ w^ fits best in the 

context of "path". In this example, thie most appropriate French word will presumably b§ "uni". Now if a text 
corpus is searched using the method defined-by the -involution, a sernantic proximity index can be worked 
45 out for each of the contextual examples Jn.the .dictioriary. and this, will show that', in View of the number of 
- common relations' founds there is a high degree .;pf se_mantic proxirnity fc)etween the words "path" and 
' "road", whereas tfie measure of proximity to the other, dictionary^ examples will bd 'much lower. On these 
grounds the system can decide that the French word "uni" is^ the correct traiislation of "smooth". ' 
1 . This example shows why the nurpber of common relations rnust.be considered in relatiiin to the total 
50 number of relations found for each word.. If, word's ^ and B haye . 50 relations in cPmrnori. for instance, 
* whereas-.words A and C. have only* 10 relat^ons^jn com then the conclusion can be drawn that A is 
closer in .meaning to B than to C, always. provided that the total riumber of relations found in the text is the 
. same for. B as for C> If. on the other -hand. -the totals ar^ different, this factor must fc>e taken" into account 
The finding of 10 common relations between A and C may be statistically niore significarit than the 50 
55 common relations betyveen A and B, if B. is a~ high-frequency word such as* "road"' and C is a relatively rare 
word. e.g..-"gasket". . • . v. / , . / . . _ ' ' ' 
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Example 2 • Measuring the degree of semantic assioclation between two words siich as PRESSURE and 
VALVE. 

Before this example is discussed in detail it imust be pointed out that there is a difference between 
semantic association and semantic' proximity.' although both are types of semantic relatedness. The words 
PRESSURE and VALVE are certainly not similai in meaning, one word (pressure) referring to an abstract 
concept and the other (valve) referring to a concrete piece of equipment. The semantic distance between 
them should therefore be relatively large. i.e. the numerical measure of semantic proximity should be low. 
However, the method described above can also be successfully applied to determine the degree of 
semantic association instead of semantic distance' or proximity, fis will be illustrated beiow. 

Just as in example 1. the two key words PRESSURE ahd VALVE a^e used to retrieve from a corpus of 
text that set of sentences in which at least one of the key words occurs. This time, however, only those 
sentences are retained in whidh both kW Words ippear.teh; such sentences extracted from a sample text 

are shovi^n below: . . .. . , ".-^ . . ■. j 

[1] A tomperature-comperigated PRESSURE swHcti. a ; VALVE' and a safety device are installed on 

the bottle. ' . - " • t '—^ 

[2] The spool VALVE supplies PRESSURE to the hydraulic motor. 

[3] K the isolation VALVE cuts off the PRESSURE to «ie system application of the brake is automatic. 
.[4] Ttie PRESSURE goes through the secohd-sta^e T^bpfbet of the^ ■Shiitoff VALVE -to the high 

20 PRESiSURE ports of tlie' spool VALVE. " ' ' ' ' 

[5] A PRESSURE rellef-VALVE ijrevents an o\/erpfessU>e'in the hydraul^^ 
[61 'a bleeiJ-air regulating and relief vALVE cbhtrbls^'the air-PRESSORE in-the system reservoir. 
. ■ - [7]. The>ff loader VALVE d^cireastes' the PRESSURE fc- 27^ - 3430 kPa (i400-500 psi) if the 
hydraulic systems are hot used. '' ' "' . -i- ' 

25 [8] Two vacuum relief-VALVEs prevent a negative PRESSURE. 

■ [91 Yhe seiectbr VAtVE supplies oir PRESSUR'E to move the iDisten in theiGbhtrol cylinder, 
[i pf 'The syst4rti-akumulitt6r nitrig^n-li^^^ gas cham-bfer of ^the system accumulator to 

its charging VALVE^ and i^ P^^^ : ^ . - ' ' '. ''"''l' '/ ^kivi. 

Again each of these sent^nc^ must be aHalysea With the ai<t=a'a parsiftg^sygtem in order to establish 
' ' ■ -30 " i!^i^'^r&^ ir^Scm^W^ M^rii^.' Once tfie's^nt^tic stKjddxfe i-fe^'kvaiabfe^.veaGh of-the.stnjctures 
" '^^ti^^Med%i=i^y'&^el^^?'^ ^••. ••.,;.-'3'"^"f ' - ■ ■ 

: . ,. '^■ -■iYfl^-^-tWtj^k^'^'vJ^iYas WSirecxi^taniieaSdri&ei^ tKe'synt&ctic structure, or ■ 

2) the two key words are linked to each other brsdmdiiitervfentnci node. ' ■ "~ 

the following table shows the kind of information which can be extracted from' such Structures after 
35 each of the sentences has been parsed and the corresponding parse structure has been established. 

1 switch "," valve + switch ATTRIBUTE pressure 

2 supply SUBJECT valve + supply OBJECT pressure 

3 cut SUBJECT valve + cut OBJECT pressure 

4 port OF valve + porf ATTRIBUTE pressure ' ^ • 
40 5 valve ATTRIBUTE Veffef + relief AtTRiBUfEi5fesSure - 

' 6 control sllBj^CT v^ve + dbrit^^^ r — i. .; ^ 

' 7 decrease SUBJEp't valve + deaeaseOeiECT-'p^^^ 
e prevent SUBJECT valve + prevent OBJECT prdssure ' ' 

V 9 supplf SUBJECrWK^e + Supply b ■ - 

45 " 1 6' valve AND gage gag^AftRIBU^^^ '-^ ' " ^ 

' ' As the table shows, 'the words PRfeSSU"ftE arid VALVE, although dissimilar in meaning, are neverthe- 
less linked to each other by theie r^latioris to other Words such as "switch", -supply-, "cut", "port", "relief 
"cohtrbr' -decrease", -prevent" arid "ga^e". Idfe'nti'fyihg these syntactifi connections in the context makes it 
" ' ' possible not oniV to estimate Ihe degree or strength of association between any given words, but also to 
identify the' kind of association involved. It is irftunediately diear from the above table that the dominating 
type of assbciation is that in which VALVE is the' subject, and PRESSURE the direct object, of some 
"cornmon verb The adual verbs encbuntelred irt this telation in the above table are "supply . cut , 
-contror. "decrease" and "prevent"; and these provide' a clear characterization of the function of a valve 

" with regard to pressure. . ' , ^ , ' . . ^ ■ ui * 

This potential application of the method' according to the ' invention proves particularly, valuable for 
making a choice in cases of ambiguity in collocations with an implicit relation, such, as noun stnngs in 
English In the above example it so happened that in the sentences retrieved, only indirect relations were 
found between the two key words, but a direct relation might well have been found in the corpus, as in the 
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collocation "pressure vatve". This would incidentally have strengthened, the index of association between 
the two words. The explicit characterization of that association Is . obtained from the mdirect connections 
shown above. Just as in example 1, iS^ye degree or strength ot^.the assQdatiqo between twp'wDrds can be 
numericaUy expressed as a function of the numtjer of connecting relations found between the two words 

5 and as a function of the total number of relations for the words themselves. 

The 'degree of semantic association, when expressed ^n a suitable^ form,, also has a role to play in 
^machine translation programs. This can be Jllustrated with the follovying example sentences:. 
[1] Remove the pins from the bandages. : , - : ^ ,^ , 

"•v- " [2] Remove the pins from the boits. : . ..^ • • ... » v ~ 

TO If in the language into which these English sentences ar^ to b^.. translated (e.Q. Dutch) it is riecessary to 
clearly differentiate between different translations of the word "pin" (e.g. the Dutch word "speld", meaning a 
•sharp-pointed fastener' in the first sentence, and Dutch "splitpen", meaning 'a kind of peg' In the second 
sentence), then in the course of translation a point will be reached at which a choice has to be made. The 
relation between the word ''pin" and the word "remove" does not help in this case, tjecause both kinds of 

75 pin can equally well be removed. The solution of the problem of word choice thus depends on establishing 
a link between one of the alternative translations of "pin" and the translation of "bandage", and between 
one of the altemative translations of "pin" and tiie translation of "bolt". In other words, the choice depends 
on the degree of association between the above-mentioned words as determined on the basts of the 
. contextual patterns they exhibit in the target language (the language into which the text is being translated). 

20 If the degree of this association Is determined using the method according to the invention. It will 
appear that tiie Dutch word for "bandages" has a stronger association with the Dutch word "speld" than it 
does with the word "splitpen". On the other hand, the Dutch word for "bolts" will show a stronger 
association with the word "splitpen" than It does with the word "speld". Thus, on the basis of the strength . 
of the observed association, a correct choice can be made for the translation of the ambiguous word "pin". 

25 The stronger the association between the relevant words, the greater the confidence wrtii which this choice 
can be made. 

Claims 

30 

1 . A method for determining the degree to which two or more lexical items belonging to a preciefined 
corpus of text in any given language are semantically related to each other, comprising the following steps: 

a) the retrieval from the said text corpus of a set of sentences in which one or more of the given two 
or more lexical items appear. 
35 b) the parsing, with the aid of a suitable parsing system, of each of the sentences retrieved, in order 

to determine the syntactic dependency structure of each of the sad sentences. 

c) for each sentence retrieved, determining from the obtained syntactic dependency structure the 
contextual relations which the given lexical items have in that sentence, i.e. identifying those items in the 
context which have a syntactic relation to those of the given lexical items which appear in the sentence 

40 concerned, together with the syntactic relations involved, 

d) determining, for each of the given lexical items, tiie total number of contextual relations found in 
step c), 

e) determining the number of contextual relations which the given lexical items have in common. 

f) determining, on the basis of tiie results obtained in steps d) and e). the degree of overlap fc)etween 
45 the contextual patterns of the given two or more lexical items. 

2. A method according to claim 1. characterized in that step f) is subdivided into two parts: 

f1) determining the number of common contextual relations which can be expected by chance alone, 
f2) comparing the number obtained by step f1) with the number obtained by step e). 

3. A method according to claim 2. characterized in that the comparison in step f2) is performed by 
50 evaluating the following formula: semantic relatedness = (OE)/(C + K), where 

C = the number of common contextual relations obtained by step e) 

E == the number of relations to be expected by chance alone, as obtained by step f1) 

K = a constant. 

4. A method according to claim 2. characterized |n that , where the number of common contextual 
55 relations to be expected by chance alone, as obtained by step f1). is larger than the number of common 

relations obtained by step e), the comparison in step f2) is performed by evaluating the following formula: 
relatedness = (C-E)/(E + K). 

5. A method according to claim 2. 3 or 4. characterized in that the result of step fl) is determined by 
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' evaluating the following formula: E A * B/f(N), where *=■ - ^ - ' ' 

' A = the number of Vdlatibhs obtained in step d) for the first lexical item. 

B = 'the number of Venations obtaihod fn step d) for the second lexical item. ; . . 

f/rsn' = ''a function of the' nurnber 6f diifferent relations, N. in the total above-mentioned prec^efined corpus of 

5 text. ' ' . . * \: J • 

* 6.' A rnethbd according to claim 1, characterized in that the degree of contextual overlap mentioned in 
step f) is obtairieii by ddterminiftg^lhe feurh of the numbers of common relations obtained by step d) for the 
individual lexical items, and then dividing the result bV the number obtained by step e). 

7. A method according to claim 6, characterized in ttiat the said Sum is multiplied by the number of 

70 lexicinteili^'fdr which the degree of rielatedness is being determined; ^ - . ' 
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